Filename: 276-lower-bw-granularity.txt
Title: Report bandwidth with lower granularity in consensus documents
Author: Nick Mathewson
Created: 20-Feb-2017
Status: Open
Target: 0.3.1.x-alpha

1. Overview

   This document proposes that, in order to limit the bandwidth needed for
   networkstatus diffs, we lower the granularity with which bandwidth is
   reported in consensus documents.

   Making this change will reduce the total compressed ed diff download
   volume by around 10%.

2. Motivation

   Consensus documents currently report bandwidth values as the median
   of the measured bandwidth values in the votes.  (Or as the median of
   all votes' values if there are not enough measurements.)  And when
   voting, in turn, authorities simply report whatever measured value
   they most recently encountered, clipped to 3 significant base-10
   figures.

   This means that, from one consensus to the next, these weights very
   often and with little significance:  A large fraction of bandwidth
   transitions are under 2% in magnitude.

   As we begin to use consensus diffs, each change will take space to
   transmit.  So lowering the amount of changes will lower client
   bandwidth requirements significantly.

3. Proposal

   I propose that we round the bandwidth values, as they are placed in votes,
   to no more than two significant digits.  In addition, for
   values beginning with decimal "2" through "4", we should round the
   first two digits the nearest multiple of 2.  For values beginning
   with decimal "5" though "9", we should round to the nearest multiple
   of 5.

   The change will take effect progressively as authorities upgrade: since
   the median value is used, when one authority upgrades, 1/5 of the
   bandwidths will be rounded (on average).

   Once all authorities upgrade, all bandwidths will be rounded like this.

4. Analysis

   The rounding proposed above will not round any value by more than 5% more
   than current rounding, so the overall impact on bandwidth balancing should
   be small.

   In order to assess the bandwidth savings of this approach, I
   smoothed the January 2017 consensus documents' Bandwidth fields,
   using scripts from [1].  I found that if clients download
   consensus diffs once an hour, they can expect 11-13% mean savings
   after xz or gz compression.  For two-hour intervals, the savings
   is 8-10%; for three-hour or four-hour intervals, the savings only
   is 6-8%.  After that point, we start seeing diminishing returns,
   with only 1-2% savings on a 72-hour interval's diff.

    [1] https://github.com/nmathewson/consensus-diff-analysis

5. Open questions:

   Is there a greedier smoothing algorithm that would produce better
   results?

   Is there any reason to think this amount of smoothing would not
   be safe?

   Would a time-aware smoothing mechanism work better?
